Goto

Collaborating Authors

 Littoral Region


A Theoretical Framework for Environmental Similarity and Vessel Mobility as Coupled Predictors of Marine Invasive Species Pathways

Spadon, Gabriel, Vaidheeswaran, Vaishnav, DiBacco, Claudio

arXiv.org Artificial Intelligence

Marine invasive species spread through global shipping and generate substantial ecological and economic impacts. Traditional risk assessments require detailed records of ballast water and traffic patterns, which are often incomplete, limiting global coverage. This work advances a theoretical framework that quantifies invasion risk by combining environmental similarity across ports with observed and forecasted maritime mobility. Climate-based feature representations characterize each port's marine conditions, while mobility networks derived from Automatic Identification System data capture vessel flows and potential transfer pathways. Clustering and metric learning reveal climate analogues and enable the estimation of species survival likelihood along shipping routes. A temporal link prediction model captures how traffic patterns may change under shifting environmental conditions. The resulting fusion of environmental similarity and predicted mobility provides exposure estimates at the port and voyage levels, supporting targeted monitoring, routing adjustments, and management interventions.


Forecasting Empty Container availability for Vehicle Booking System Application

Gouabou, Arthur Cartel Foahom, Al-Kharaz, Mohammed, Hakimi, Faouzi, Khaled, Tarek, Amzil, Kenza

arXiv.org Artificial Intelligence

Container terminals, pivotal nodes in the network of empty container movement, hold significant potential for enhancing operational efficiency within terminal depots through effective collaboration between transporters and terminal operators. This collaboration is crucial for achieving optimization, leading to streamlined operations and reduced congestion, thereby benefiting both parties. Consequently, there is a pressing need to develop the most suitable forecasting approaches to address this challenge. This study focuses on developing and evaluating a data-driven approach for forecasting empty container availability at container terminal depots within a Vehicle Booking System (VBS) framework. It addresses the gap in research concerning optimizing empty container dwell time and aims to enhance operational efficiencies in container terminal operations. Four forecasting models-Naive, ARIMA, Prophet, and LSTM-are comprehensively analyzed for their predictive capabilities, with LSTM emerging as the top performer due to its ability to capture complex time series patterns. The research underscores the significance of selecting appropriate forecasting techniques tailored to the specific requirements of container terminal operations, contributing to improved operational planning and management in maritime logistics.


VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks

Huang, Jingyuan, Huang, Jen-tse, Liu, Ziyi, Liu, Xiaoyuan, Wang, Wenxuan, Zhao, Jieyu

arXiv.org Artificial Intelligence

Visual-Language Models (VLMs) have shown remarkable performance across various tasks, particularly in recognizing geographic information from images. However, significant challenges remain, including biases and privacy concerns. To systematically address these issues in the context of geographic information recognition, we introduce a benchmark dataset consisting of 1,200 images paired with detailed geographic metadata. Evaluating four VLMs, we find that while these models demonstrate the ability to recognize geographic information from images, achieving up to $53.8\%$ accuracy in city prediction, they exhibit significant regional biases. Specifically, performance is substantially higher for economically developed and densely populated regions compared to less developed ($-12.5\%$) and sparsely populated ($-17.0\%$) areas. Moreover, the models exhibit regional biases, frequently overpredicting certain locations; for instance, they consistently predict Sydney for images taken in Australia. The strong performance of VLMs also raises privacy concerns, particularly for users who share images online without the intent of being identified. Our code and dataset are publicly available at https://github.com/uscnlp-lime/FairLocator.


An Orthogonal Polynomial Kernel-Based Machine Learning Model for Differential-Algebraic Equations

Taheri, Tayebeh, Aghaei, Alireza Afzal, Parand, Kourosh

arXiv.org Artificial Intelligence

A system of differential-algebraic equations (DAEs) is a combination of differential equations and algebraic equations, in which the differential equations are related to the dynamical evolution of the system, and the algebraic equations are responsible for constraining the solutions that satisfy the differential and algebraic equations. DAEs serve as essential models for a wide array of physical phenomena. They find applications across various domains such as mechanical systems, electrical circuit simulations, chemical process modeling, dynamic system control, biological simulations, and control systems. Consequently, solving these intricate differential equations has remained a significant challenge for researchers. To address this, a range of techniques including numerical, analytical, and semi-analytical methods have been employed to tackle the complexities inherent in solving DAEs.


KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation

Li, Haotian, Wang, Lingzhi, Wei, Yuliang, Da Xu, Richard Yi, Wang, Bailing

arXiv.org Artificial Intelligence

Knowledge graph completion is a task that revolves around filling in missing triples based on the information available in a knowledge graph. Among the current studies, text-based methods complete the task by utilizing textual descriptions of triples. However, this modeling approach may encounter limitations, particularly when the description fails to accurately and adequately express the intended meaning. To overcome these challenges, we propose the augmentation of data through two additional mechanisms. Firstly, we employ ChatGPT as an external knowledge base to generate coherent descriptions to bridge the semantic gap between the queries and answers. Secondly, we leverage inverse relations to create a symmetric graph, thereby creating extra labeling and providing supplementary information for link prediction. This approach offers additional insights into the relationships between entities. Through these efforts, we have observed significant improvements in knowledge graph completion, as these mechanisms enhance the richness and diversity of the available data, leading to more accurate results.


SERENGETI: Massively Multilingual Language Models for Africa

Adebara, Ife, Elmadany, AbdelRahim, Abdul-Mageed, Muhammad, Inciarte, Alcides Alcoba

arXiv.org Artificial Intelligence

Multilingual pretrained language models (mPLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning. To date, only ~31 out of ~2,000 African languages are covered in existing language models. We ameliorate this limitation by developing SERENGETI, a massively multilingual language model that covers 517 African languages and language varieties. We evaluate our novel models on eight natural language understanding tasks across 20 datasets, comparing to 4 mPLMs that cover 4-23 African languages. SERENGETI outperforms other models on 11 datasets across the eights tasks, achieving 82.27 average F_1. We also perform analyses of errors from our models, which allows us to investigate the influence of language genealogy and linguistic similarity when the models are applied under zero-shot settings. We will publicly release our models for research.\footnote{\href{https://github.com/UBC-NLP/serengeti}{https://github.com/UBC-NLP/serengeti}}


AfroLID: A Neural Language Identification Tool for African Languages

Adebara, Ife, Elmadany, AbdelRahim, Abdul-Mageed, Muhammad, Inciarte, Alcides Alcoba

arXiv.org Artificial Intelligence

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for $517$ African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID's powerful capabilities and limitations.


10 Simple Things to Try Before Neural Networks - KDnuggets

#artificialintelligence

It is not always the big stuff or the latest packages that help improve the accuracy or performance of our #machine learning models. At times we overlook the basics of Machine Learning and rush to higher order solutions. When the solution is just right there in front of us. Below are 10 simple things you should remember to try first before throwing in the towel and jumping straight to RNNs and CNNs (of course there are datasets which merit you to start straight from LSTMs and BERT).Let us remind ourselves of our checklist before bringing out our Calculus skills. Try to understand as much about the domain as you can.


Will Data Analysts be Replaced by AI? - KDnuggets

#artificialintelligence

It is true that parts of data analytics are being automated every day like visualization and reporting, and I believe the trend will continue well into the future. I can understand this statement because many professions such as call center agents are being replaced by Chat bots and aspiring data analysts are afraid that if they start learning data analytics in a few years' time they too will be out of work. What is good about AI? When used on routine and repetitive tasks AI is very efficient. Finally, when we look at latest developments in "Strong" AI such as GANs, we see that machines can really take us to places we never imagined.


Automatic Speech Recognition using limited vocabulary: A survey

Fendji, Jean Louis K. E., Tala, Diane M., Yenke, Blaise O., Atemkeng, Marcellin

arXiv.org Artificial Intelligence

Automatic Speech Recognition (ASR) is an active field of research due to its huge number of applications and the proliferation of interfaces or computing devices that can support speech processing. But the bulk of applications is based on well-resourced languages that overshadow under-resourced ones. Yet ASR represents an undeniable mean to promote such languages, especially when design human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary. This work consequently provides a way to go when designing ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey applied to ASR systems in general.